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Abstract 

Background: Traditional anthropometric studies of human face rely on manual measurements of simple features, 
which are labor intensive and lack of full comprehensive inference. Dense surface registration of three-dimensional 
(3D) human facial images holds great potential for high throughput quantitative analyses of complex facial traits. 
However there is a lack of automatic high density registration method for 3D faical images. Furthermore, current 
approaches of landmark recognition require further improvement in accuracy to support anthropometric 
applications. 

Result: Here we describe a novel non-rigid registration method for fully automatic 3D facial image mapping. This 
method comprises two steps: first, seventeen facial landmarks are automatically annotated, mainly via PCA-based 
feature recognition following 3D-to-2D data transformation. Second, an efficient thin-plate spline (TPS) protocol is 
used to establish the dense anatomical correspondence between facial images, under the guidance of the 
predefined landmarks. We demonstrate that this method is highly accurate in landmark recognition, with an 
average RMS error of -1.7 mm. The registration process is highly robust, even for different ethnicities. 

Conclusion: This method supports fully automatic registration of dense 3D facial images, with 17 landmarks 
annotated at greatly improved accuracy. A stand-alone software has been implemented to assist high-throughput 
high-content anthropometric analysis. 

Keywords: 3D face, Facial morphology, Registration, Landmark localization, Dense correspondence 



Background 

Large-scale, high-throughput phenotyping is becoming 
increasingly important in the post-genomics era. Ad- 
vanced image processing technologies are used more 
and more for collecting deep and comprehensive mor- 
phological data from different organisms, such as yeast 
[1], plants [2], worm [3] as well as mice [4]; and for dif- 
ferent body parts such as brain [5,6], lung [7] and face 
[8-13]. Especially for the brain 3D image registration, a 
novel elastic registration (HAMMER) of magnetic reson- 
ance images of the brain has greatly facilitated the 
medical research of brain [6,14]. A recent work that 
combined florescent labeling and non-rigid registration 
achieved registration accuracy up to 2 um in drosophila 
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brain, which potentially allows functional analyses at 
individual neuron resolution [5]. The soft tissue of the 
human face is a complex geometric surface composed of 
many important organs, including eyes, nose, ears, mouth, 
etc. Given its essential biological functions, the human face 
has been a key research subject in a wide range of fields in- 
cluding anthropology [15], medical genetics [8,9,16,17], fo- 
rensics [18,19], psychology [20,21], aging [22,23] and the 
upcoming quantitative genomics [24,25], etc.. Nonetheless, 
for a long period of time period the rich quantitative traits 
of face have not been made full use of. Previous anthropo- 
metric studies have been largely based on tedious manual 
measurements of dozens of distances between a set of 
landmarks, which were subjectively determined by the 
observers' eyes and were thus error prone and sensitive 
to individual differences [26-28]. In the past few years, 
efforts have been paid to discover the genetic determi- 
nants of normal facial variations either by examining 
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candidate genes or via genome-wide association studies 
[24,25,29-33]. Although high resolution 3D images were 
taken in some of these studies, the landmarks were still 
manually annotated [25]; and simple landmark coordi- 
nates or landmark-distances were used as the major 
phenotype data [24,25,33]. Such practices unavoidably 
led to loss of resolution and statistic power. In short, 
the lack of quantitative methods to capture the facial 
morphology in high definition and full automation has 
hindered the biological researches in human face. In 
fact, in the field of computer vision, many technologies 
have been developed for landmark recognition and dense 
point registration. Nonetheless, few have been successfully 
applied in the biological studies of human face. This is 
largely due to the different requirements between com- 
puter vision and biological studies. Quantitative biological 
analyses of face require the face registration to follow the 
principle of anatomical correspondence and landmarks 
have to be localized at high accuracies. However such rules 
are often not satisfied in the computer vision methods. For 
image registration, many methods rely on rigid transform- 
ation, such as the Iterative Closest Point (ICP) [34]. ICP 
uses affine transformations, including rotation and transla- 
tion to find the closest corresponding points between two 
surfaces. The registration based on ICP does not fully cap- 
ture the anatomical variability, especially when faces to be 
compared differ significantly in shape or expression. For 
landmark localization, there exist many automatic methods 
[18,35-39]. Landmark localization methods based on ICP 
suffer from the intrinsic incompatibility with anatomical 
correspondence [40,41]. At some point, many landmark 
localization approaches use local, curvature-based facial 
features due to their invariance to surface translation and 
rotation. The two most frequently adopted features are the 
HK curvature and the shape index [35-37,42]. However, 
curvature-based descriptors often suffer from surface ir- 
regularities, especially near eye and mouth corners [43]. 
Other studies have used appearance-based methods 
where the facial features are modeled by basis vectors 
calculated from transformations such as Principal Com- 
ponent Analysis (PC A) [44,45], Gabor wavelets [46,47], 
or the Discrete Cosine Transform (DCT) [39]. However, 
the lowest mean localization errors (root mean square 
error, RMS) these approaches can achieve were around 
3-5 mm [18,35-39], not accurate enough for high reso- 
lution morphometric analyses. 

For biological inference, anatomical correspondence 
has to be established. This can be achieved by non-rigid 
transformations. A common method for deforming 3D sur- 
faces is the thin-plate spline (TPS) algorithm [48]. The 
process of using TPS warping involves minimizing a bend- 
ing energy function for a transformation over a set of fidu- 
cial points (landmarks), thereby bringing the corresponding 
fiducial points on each surface into alignment with each 



other. A dense registration method has been developed 
based on TPS, and was successfully used to detect many 
face dysmorphology caused by rare genetic defects such as 
Noonan, 22qll deletion, Bardet-Biedl and Smith-Magenis 
syndromes [8-13]. This approach therefore demonstrated 
great importance in biological and medical research. How- 
ever, the TPS based registration has a key limitation that 
restrained its wide use in large-scale 3D facial datasets: 
A set of landmarks have to be localized as the anchoring 
points before TPS can be carried out. Methods have 
been developed to combine ICP-based landmark anno- 
tation and TPS warping to fully automate the registra- 
tion [40,41]. However, the landmark correspondences 
found by ICP are not exactly anatomically homologous, 
as previously discussed. 

In this study, we develop an automatic registration 
method which combines a novel solution of landmark 
localization and an efficient protocol of TPS -based sur- 
face registration. The landmark localization mainly em- 
ploys PCA to extract landmarks on surfaces by use of 
both shape and texture information. For the surface 
registration, a new TPS warping protocol that avoids the 
complication of inverse TPS warping (a compulsory pro- 
cedure in the conventional registration method) is used 
to resample the meshes according to the reference mesh. 
We show that this method is highly accurate and robust 
accross different ethnicities. We also propose a new 
spherical resampling algorithm for re-meshing surfaces 
which efficiently removes the caveats and improves the 
mesh structure. Furthermore, the associated texture is 
also included in the registered data for visualization and 
various analyses. 

Methods 

Ethics statement 

Sample collection in this study was carried out in ac- 
cordance with the ethical standards of the ethics com- 
mittee of the Shanghai Institutes for Biological Sciences 
(SIBS) and the Declaration of Helsinki, and has been 
specifically surveyed and approved by SIBS. A written 
statement of informed consent was obtained from every 
participant, with his/her authorizing signature. The par- 
ticipants, whose transformed facial images are used in 
this study as necessary illustrations of our methodology, 
have been shown the manuscript and corresponding 
figures. Aside from the informed consent for data sam- 
pling, a consent of publication was shown and explained 
to each participant and their authorizing signature was 
obtained as well. 

The 3D face data set 

Three-dimensional facial images were acquired from indi- 
viduals of age 18 to 28 years old, among which 316 (114 
males and 202 females) were Uyghurs from Urumqi, China 
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and 684 (363 males and 321 females) were Han Chinese 
from Taizhou, Jiangsu Province, China. Another training 
set which did not overlap with the first 1000 sample faces, 
consisted of 80 Han Chinese, 40 males and 40 females 
from Taizhou, Jiangsu Province, China. The participants 
were asked to pose an approximately neutral facial expres- 
sion, and the 3D pictures were taken by the 3dMDface @ 
system (www.3dmd.com). Each facial surface was repre- 
sented by a triangulated, dense mesh consisting of -30000 
vertices, with associated texture (Figure 1). 

Workflow 

The workflow is briefly described as follows (Figure 2). 
Starting with a set of raw 3D face scans, the nose tip is 
first automatically localized on each face using a sphere 
fitting approach and pose normalization is performed to 
align all sample faces to a uniform frontal view. For the 
landmark annotation, the six most salient landmarks were 
first manually labeled on a set of training samples; Princi- 
pal Component Analysis (PCA) was then employed to 
localize these 6 landmarks on the sample surfaces and 11 
additional landmarks were heuristically annotated after- 
wards. A reference face was then chosen, re-meshed using 
spherical sampling, and TPS -warped to each sample face 
using the 17 landmarks as fiducial points. A dense, bio- 
logical correspondence was thus built by re-meshing the 
sample face according to the reference face. The corres- 
pondence is further improved by using the average face as 
the reference and repeating the registration process. 

Preliminary nose tip localization and pose normalization 

In 3D facial image processing, pose normalization and 
landmark localization are highly dependent on each 
other since pose normalization is typically guided by 
landmarks. The features commonly used for pose cor- 
rection are the nose tip and inner eye corners as they 
are easier to detect [35], less sensitive to pose variation, 
and invariant to facial expressions [42,44,49,50]. On the 
other hand, most existing landmark localization ap- 
proaches rely on the assumption of frontal or approxi- 
mately frontal poses and are therefore sensitive to roll 



and yaw rotation [38,42,51]. In order to fully automate 
the pose normalization and landmark annotation, we 
first identify the most robust and prominent landmark, 
the nose tip. 

Since the area around the nose tip can be approxi- 
mated as a semi-sphere with a diameter specific to nose, 
we try to identify the nose tip by fitting a sphere around 
every vertex using its surrounding vertices. A vertex is 
likely the nose tip if its neighboring points fit a sphere 
very well and the sphere diameter approaches the specific 
value of the nose tip. As this method is insensitive to the 
pose of the face, the spherical area on nose tip can be seen 
as a rotation invariant descriptor (RID). The algorithm is 
described as follows. Let us denote a facial mesh com- 
posed of N points by F = {pj for i = 1, N. Suppose S is 
the set of M points that are within distance R around the 
point pj (l<j<N). The best fit sphere T around pj is 
therefore determined by two parameters, namely the cen- 
ter O = (a, by c) and radius r. Another parameter e is the 
average residual distance of the M points to the best fit 
sphere, e describes how well the set of M points fit onto a 
sphere. A detailed description of sphere fitting and the cal- 
culation of e can be found in Additional file 1. The smaller 
e is, the better S fits a sphere. The two parameters, r and 
e, are calculated for every point. In order to form a proper 
sphere of radius r around each vertex, the included dis- 
tance to adjacent points (R) must be slightly larger than 
the radius of the sphere (r) as it is assumed that not every 
point will lie on the sphere. On the other hand, r should 
be chosen with good consistency across genders and eth- 
nic backgrounds, thereby establishing a uniform criterion 
for all faces. To determine the optimal R and r, we ran- 
domly chose 50 face images from each of the four groups: 
Han male, Han female, Uyghur male and Uyghur female. 
All the 200 images were manually annotated for the nose 
tip. For decreasing R radius values starting at 18 mm (a 
value more than big enough to cover the nose tip region), 
the sphere fitting was carried out and the best fit r values 
were obtained for each image. For every R value, the 
average r was then calculated for each group as well as for 
all four groups (Additional file 2: Figure SI). The global 




Figure 1 The surface used in our research, (a) The coordinate system used in our research (red, green and blue axes stand for x, y and z axes 
respectively), (b) An example scan with 17 landmarks marked by the colored spots. The red spots are the 6 most salient landmarks, namely the 
inner and outer corners of the eyes and both corners of the mouth, the blue spots indicate the other 1 1 landmarks used in this study, (c) Raw 
mesh details around the nose tip. 
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Figure 2 The workflow of the analysis. 



average r value of 11.7 mm were chosen for the lowest 
variance across different sexes and ethnicities, denoted as 
r 0 ; and the corresponding R of 15 mm was also chosen 
and denoted as R 0 . The two spherical parameters can then 
be combined with the optimal r 0 radius into one statistic 
(f) which describes how well a given point fits the criteria 
for a nose tip: 



/ = e(r 0 + \r-r 0 \) 



(1) 



The /value should be very small around the nose tip 
region. Indeed, we found that small /values congregated 
around the nose tip area (Figure 3). More interestingly, 
the global minima of the /values consistently appeared 
close to the manually annotated nose tip across hundreds 
of observations. We therefore use the point with the mini- 
mum /value to approximate the nose tip (Figure 3). The 
left column, a Han Chinese male in the middle column 
and an Uyghur female in the right column. Top row, the / 
values are shown as color gradients. Warm colors indicate 
convex sphere fitting, while the cold colors indicate con- 
cave to the reader. The /values deviating more from 0 are 
marked with greater color intensity. Central row, the 
minimum convex /values plotted for different individuals, 



which can be seen to coincide with the manually anno- 
tated nose tips shown in the bottom row. 

The pose correction becomes easy once the nose tip 
has been located. Correcting the pose basically consists 
of resetting the viewing coordinate system where an ori- 
gin point and two axes must be defined. In some studies, 
the ICP matches are applied [52,53]. Other studies try to 
find landmarks (i.e. inner eye corners) other than the 
nose tip to determine the pose [38,48]. However, in this 
study we followed a rather practical solution in which all 
vertices within 50mm of the nose tip are used to correct 
the pose via the Hotelling transformation [52,53]. 

Localization of the six most salient landmarks using PCA 

Here we propose a novel landmark localization method. 
The basic idea is to transform the 3D shape and texture 
data into a 2D space. A 2D PCA algorithm is then used to 
identify the six most salient landmarks, namely the inner 
and outer corners of the eyes and both corners of the 
mouth (Figure lb). First, the image texture is converted to 
the YCbCr color space, in which the Y component defining 
the gray scale intensity is calculated as y = 0.30r + 0.59 + 
0.11b. Only the gray scale values are used as color infor- 
mation for this step. For any 3D face image, the plane 
defined by the x and y axes is defined as the target 2D 
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Figure 3 Nose tip localization using RID illustrated for three individuals, a Han Chinese female in the left column, a Han Chinese male in 
the middle column and an Uyghur female in the right column. Top row, the f values are shown as color gradients. Warm colors indicate convex 
sphere fitting, while the cold colors indicate concave to the reader. The f values deviating more from 0 are marked with greater color intensity. 
Central row, the minimum convex f values plotted for different individuals, which can be seen to coincide with the manually annotated nose tips 
shown in the bottom row. 



space. The 3D surface and its corresponding 2D texture 
are then resampled on a uniform square grid at a 1mm 
resolution to obtain the corresponding z coordinate values 
and gray scale values. These values are directly mapped to 
the target 2D space (Figure 4). In order to minimize the 
data noise, the z coordinate and gray scale values are de- 
noised using a 3 x 3 decision-based median filter [37] . 
Only the values of the outer most layer are transformed to 
2D following the z buffering algorithm, particularly for the 
areas where the 3D surface folds into a multilayer along 



the z-axis [54]. Holes that may occur inside the surface 
are closed by bicubic interpolation as previously de- 
scribed [55]. The interpolation process was done separ- 
ately on texture and the 2.5D image data. The resulting 
2D image combines both shape and texture information, 
which serves as the basis for the PCA-based landmark 
localization. The PCA analysis is a commonly used 
approach for accurate pattern recognition in 2D data 
[54,56,57]. It involves retrieving the feature signatures 
by dissecting the training data with PCA, followed by 





Figure 4 The signature patch for the left lip corner illustrated in the 2D space, (a) The z coordinate values mapped into the 2D space, (b) 
The gray scale values mapped into the 2D space. 



Guo et al. BMC Bioinformatics 2013, 14:232 
http://www.biomedcentral.com/1471-2105/14/232 



Page 6 of 12 



projecting the sample data into the PCA eigenspace to 
determine the similarity. In this study, the landmark sig- 
nature is obtained by defining a patch of a given size, 
say smm x smm, centered around the manually anno- 
tated landmark in the training set (Figure 4). Each patch 
therefore contains s 2 z coordinate values, which are then 
concatenated into a vector and normalized to have zero 
mean and unit length. We define it as Z = (z±, z 2 ,.-.,z sxs ). 
The same number of gray scale values are also 
concatenated into a vector and normalized to have unit 
length. We define it as Y= (yi,y2>--->ysxs)- Z an d 3^ can be 
combined together to specify the shape and texture 
properties around the landmark: 

P = (z u y 1 ,z 2 ,y 2 , '-,z sxs1 y sxs ) T (2) 

P is then calculated for the signature eigenspace U 
using PCA (see Additional file 1 for details). To find the 
landmarks in a sample face, a patch of s mmxs mm is 
similarly defined for every point in the corresponding 
2D grid, and a sample patch vector P s is derived follow- 
ing equation (2). P s is subsequently projected to the 
space U to evaluate its closeness to the origin point of U. 
In this study, two measurements of closeness are used, the 
reconstruction error e and the Mahalanobis distance d 
(see Additional file 1 for details). Sample points with 
smaller values for e and d are more likely to be a valid 
landmark. Therefore, the sample point corresponding to 
the minimum product value of e and d is defined as the 
landmark in our work. The patch size parameter s inevit- 
ably affects the final localization accuracy. We formally 
evaluated the dependence of the accuracy on the patch 
size, as illustrated in Additional file 3: Figure S2. Briefly, 
we checked the distances of the automatically annotated 
landmarks to the manually annotated ones, which we de- 
fine as the localization error, for a random set of 100 indi- 
viduals of different ethnicities and genders (25 individuals 
from each of the four groups: Han male, Han female, 
Uyghur male and Uyghur female). We found that in 
general, the localization error decreases with patch size 
(Additional file 3: Figure S2). However the error reaches a 
minimum when s is around 21mm and further increasing 
in the patch size does not reduce the error. Therefore we 
use the s value of 21mm throughout this study. To further 
optimize the computational efficiency, we narrow down 
the search for each landmark to a corresponding "land- 
mark zone" on each sample face. Briefly, an arithmetic 
mean is calculated for each landmark across the training 
set, and projected onto the 2D space. Rectangular areas 
around the projection points are then defined as the 
landmark zones, with their sizes set experimentally (i.e. 
by training through a large number of faces) to ensure 
all real landmarks are encompassed. Therefore, the 



search for a particular landmark is done only within this 
landmark zone. 

Heuristic localization of ten additional landmarks 

Given the annotation of the six most salient landmarks, 
the pose of the surface can be fine tuned again. The ref- 
erence plane is set to be the best fit plane to the six 
landmarks by least squares. The normal to the reference 
plane is set to be the z axis, and the y axis is given by 
the projection of the line going through the centers of 
lip corners and the eye corners onto the reference plane. 
The x axis is uniquely determined afterwards. 

After the pose correction, 10 additional landmarks are 
identified heuristically by using geometric relations and 
texture constraints and the nose tip position is also 
updated. These landmarks include soft tissue nasion, 
alares, subnasale, labiale superius (upper lip point), 
stomion (the middle point between the upper and lower 
lip), labiale inferius (lower lip point), pogonion (chin 
point), and earlobe tips. The nose tip can be fine tuned 
according to the more uniformly defined coordinate sys- 
tem across all sample surfaces. Briefly, a semi-sphere is 
refitted around the previous nose tip and the point that 
minimizes the z coordinate error is chosen as the new 
nose tip. The subnasale point can be located by finding 
the inflection point with the minimum angle right below 
the nose tip. The alare points are the inflection points 
with the minimum local angles going horizontally away 
from the nose tip. Similar angle heuristics are applied to 
the detection of labiale superius, inferius, and stomion, 
with additional texture information in the YCbCr color 
space. For example, the labiale superius should locate 
the position on the border line where the Cr values 
below the line are greater (more red) than those above. 
Noticing that the region around the nasion point is ap- 
proximately saddle-shaped and that of the chin point is 
ellipsoidal or sphere-shaped, both characterized by the 
two-way symmetry, we therefore locate the two points 
by finding the maximum local symmetry scores. The 
earlobe points are easily found by locating the tips with 
sheer slopes along the z-axis. 

Spherical resampling and surface remeshing 

During the 3D image acquisition, the surface meshes often 
suffer from mesh structure irregularities and/or defects 
such as mesh holes (see Figure lc for example). Surface 
remeshing is often used to solve such problems [58]. In 
this work, we apply spherical sampling to the reference 
surface to obtain a well- structured mesh. Spherical sam- 
pling is preferred as human faces are approximately 
ellipsoidal. We first perform a spherical parameterization 
to the surface using the geographic coordinates. Given a 
vertex {x b y b Zj) on the original surface mesh, the spherical 
parameterization (^0^0/) can be obtained as follows: 
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Pi = y/ 2 *}+y}+*} 

di = sxcsmfa/pi) (3) 
= arctanx(\/2^'/^) 



The x-coordinate is multiplied by a factor y/2 before 
the coordinate conversion, to compensate for the face 
aspect ratio (height to width) [53]. When plotted against 
6 and cp, the parameterized surface unfold into a nearly 
flat plane. This surface is then trimmed with an oval 
path to remove the irregular edges and re-sampled from 
a uniform square grid with an interval of 0.005 for both 
0 and cp. The re-sampled data points are then converted 
back to the Cartesian coordinate system to define a new 
surface mesh. 

Surface registration for dense correspondence 

In order to preserve the anatomical correspondence across 
the facial surfaces, we adopted the idea of the TPS -based 
registration method proposed previously [59]. In that 
study, all surfaces were first manually annotated for a set of 
landmarks. The sample surfaces and the reference were all 
TPS warped to the cross-sample average landmarks. Each 
sample surface was then re-meshed by the closest points to 
the reference vertices, and further inverse TPS warped 
back to the original shape. Mathematically, TPS warping is 
not invertible. Although an approximation exists, it is com- 
putationally intensive and error prone [60]. In our study, 
we designed an alternative scheme. First, a well-structured 
surface with few defects is chosen as the reference face, 
and spherically remeshed as described above. Then only 
the reference surface is TPS warped to each sample sur- 
face, taking the 17 landmarks as the fiducial points. The 
TPS warping is done as previously described [13]. There- 
after the vertices on the reference surface find their closest 
projections on the sample surface, which define the new 
mesh vertices of the sample surface [13,61]. The dense cor- 
respondence is established after all the sample surfaces are 
remeshed using the same reference. This approach elimi- 
nates the need for inverse TPS warping, and enhances the 
computational efficiency as well. 

Results 

Accuracy of the landmark localization 

In this section we demonstrate the accuracy of the pro- 
posed algorithm for automatic landmark localization. The 
accuracy is measured by the deviation of the automatically 
annotated landmarks from those manually annotated. 

A subset of the sample surfaces were picked randomly 
and manually annotated with the 17 landmarks by the ex- 
perimenter who did the same to the training set. Automatic 
landmark annotation was also performed independently. 
The surfaces missing some features such as the earlobes 



were removed from further analysis. This left 115 Han 
Chinese (56 males, 59 females) and 124 (48 males, 76 fe- 
males) Uyghur for the evaluation. The mean and standard 
deviation (SD) of the annotation errors measured in Euclid- 
ean distance, as well as the root mean square (RMS) errors 
were calculated (Table 1). As can be seen from Table 1, 
most landmarks have mean errors between 1mm and 
1.5mm, indicating rather high accuracy. Most of the SD 
values are below 1mm, suggesting good consistency of the 
annotation across different samples. The RMS error is 
within the range of 1.1-2 mm for most measurements. 
Greater errors are found for the Pogonion (-1.8 mm mean 
error for both the Han Chinese and Uyghur) and the two 
earlobe tips (mean error 2-3 mm, SD error 1.6 - 2.2 mm 
and RMS error 2.6 - 3.6 mm). Pogonion and earlobes are 
both strongly affected by facial/head hair, which may ac- 
count for the relatively larger errors and standard devia- 
tions. It is worth noticing that all the error values are 
similar between the Han Chinese and Uyghur samples des- 
pite the use of the Han Chinese training set. Given the 
substantial genetic and morphological differences between 
these two ethnic populations, this indicates good robust- 
ness of our method when applied to different ethnicities. 

Robustness of the registration method 

One way to evaluate the robustness of the registration 
method is to determine how the use of different references 
would affect the correspondence mapping. We performed 
such an evaluation, as shown in Figure 5. First, we obtained 
the average Han Chinese male and female faces by register- 
ing all the Han Chinese samples to the same reference sur- 
face, followed by obtaining the average meshes across 
either gender group (average face calculation is explained 
in more detail in the next section). These average faces are 
point-to-point corresponded. We can see the two average 
faces differ substantially in their shape (Figures 5a and c). 
To test the robustness of the registration method, a test 
face (Figure 5b) is chosen randomly from the data set, and 
registered separately using either average face as the refer- 
ence. Euclidian distances are calculated for the pairing 
points between the two newly registered meshes. One ex- 
pects to see small differences between the two registration 
results if the method is robust. Figure 5d shows that most 
parts have point-wise errors much less than 0.9 mm, which 
indicates the robustness of our registration method with 
varying references. Certain regions like eyebrows exhibit 
greater errors, most likely due to the mesh irregularities 
caused by facial hair. 

The average faces calculation with the 3D face 
registration 

We applied the proposed 3D face registration method to 
the whole 3D face sample set. In total 363 male and 321 
female Han Chinese and 114 male and 202 female Uyghur 
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Table 1 Mean error and standard deviation of 17 automatically annotated landmarks 



Landmarks 




Han 






Uyghur 




Mean error (mm) 


SD error (mm) 


RMS error 


Mean error (mm) 


SD error (mm) 


RMS error 


Right eye outer corner 


1.339 


0.947 


1.641 


1.511 


1.091 


1.864 


Right eye inner corner 


1.192 


1.155 


1.660 


1.280 


0.914 


1.573 


Left eye inner corner 


1.162 


0.809 


1.416 


1.489 


0.896 


1.738 


Left eye outer corner 


1.148 


0.738 


1.365 


1.507 


1.008 


1.814 


Right lip corner 


0.995 


0.581 


1.153 


1.424 


0.817 


1.642 


Left lip corner 


1.012 


0.569 


1.161 


1.147 


0.693 


1.341 


Nose tip 


0.761 


0.747 


1.067 


1.113 


0.549 


1.242 


Nasion 


1.487 


0.654 


1.625 


1.604 


0.833 


1.808 


Right alare 


1.310 


0.752 


1.511 


1.034 


0.742 


1.273 


Left alare 


1.480 


0.798 


1.682 


1.128 


0.577 


1.268 


Lip center 


1.189 


0.715 


1.388 


1.145 


0.812 


1.404 


Upper lip 


1.270 


0.769 


1.485 


1.727 


1.138 


2.069 


Lower lip 


1.380 


0.855 


1.624 


1.501 


0.941 


1.772 


Subnasale 


1.299 


1.263 


1.812 


0.999 


0.574 


1.153 


Pogonion 


1.809 


1.878 


2.608 


1.785 


0.882 


1.992 


Right earlobe tip 


2.074 


1.658 


2.656 


2.866 


2.18 


3.601 


Left earlobe tip 


2.678 


1.640 


3.141 


2.835 


2.054 


3.501 



were included in this analysis. All surfaces were automat- 
ically annotated. One Han Chinese face with few caveats 
and fully exposed skin was chosen as the reference, to 
which all the sample faces were registered. The General- 
ized Procrustes Analysis (GPA) was then used to align all 
the registered surfaces to a common coordinate [62]. The 



average faces were then calculated as the average meshes 
colored by the corresponding average texture pixels across 
all the samples in each group. Figure 6 shows the average 
faces of the four groups. As can be seen, the average faces 
well retain the morphological and textural features of each 
group. Specifically, the Uyghur are characterized by a 




Figure 5 Evaluation of the robustness of the registration method. The average face of either gender is used as the reference to register the 
sample surface, and the registration results are compared, (a) The average face of male Han Chinese, (b) The sample face to be registered, (c) 
The average face of female Han Chinese, (d) The comparison of the two registration results. The differences are represented in color gradients, 
with the darker colors denoting greater pointwise differences. 
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more protruding nose and eyebrow ridges while Han 
Chinese have wider cheeks. The skin pigmentation also 
seems lighter for the Uyghur compared to the Han 
Chinese. This difference could not be confirmed as the 
environmental light was not well controlled when the 
3D images were taken. 

Discussion 

In this work we propose a fully automatic registration 
method for high resolution 3D facial images. This method 
combines automatic landmark annotation and TPS -based 
registration. Pevious TPS based automatic registration 
methods suffer from low accuracy in landmark localization 
[40,41], and are not suitable for anthropometric studies. 
For the problem of landmark localization, most time- 
honored solutions deal with only 2.5D data, leaving out the 
texture information. In particular, Perakis et al. described a 
method that made use of a comprehensive list of local 
shape descriptors, and achieved a precision of around 
4 mm [63]. Szeptycki et al. combined curvature analysis 
with a generic face model in a coarse-to-fine workflow, 
which enabled rotation invariant 3D landmark annotation 
at a precision of around 10 mm [37]. On the other hand, 
D'Hose et al. made use of the Gabor wavelets to extract 
curvature information for coarse landmark localization, 
followed by an ICP-based fine mapping [47]. This study 
achieved an overall precision level of a bit over 3 mm [47]. 
Hutton et al. developed a method called the dense surface 
model (DSM), which hybridized the ICP optimization and 
active shape model (ASM) fitting to enable the automatic 



registration of 3D facial surfaces [12]. They demonstrated 
that for the ten studied landmarks, the estimated positions 
using the DSM method have relatively small RMS errors 
(~3 mm) from the manual annotations. In this study, we 
constructed a novel PCA based landmark localization 
method, which made used of both the 3D geometric and 
2D texture information, and achieved much lower land- 
mark RMS errors, 1.7 ~ 1.8 mm on average, for a bigger 
number (17) of landmarks (Table 1). If the less salient land- 
marks, such as the earlobe tips, are excluded from the ana- 
lysis, the errors will decrease further (Table 1). The novel 
use of both shape and texture information played a key 
role in improving the landmark localization accuracy. We 
found that the positions of some salient landmarks such as 
the eye corners are ambiguous even manually when the 
texture is removed. Texture gives rich information about 
the facial anatomical layout, such as the boundaries of 
different skin/tissue types. In fact, texture is almost the 
only information source for pattern recognition in 2D im- 
ages and has been shown to give good performance. We 
projected both the shape and texture data into the 2D 
space, where the well-established PCA algorithm was used 
to detect the key landmarks. We also made use of the tex- 
ture information for detecting certain other landmarks. 
Furthermore, due to the use of simple and optimized algo- 
rithms, the landmark annotation is also very efficient and 
does not require large amounts of memory. Hundreds of 
surfaces can be annotated within several minutes on a 
standard Windows PC. It is known that PCA can give 
wrong results on multi-modal feature distributions. This is 



WW 

Figure 6 Front and profile views of the average faces. From left to right: Han Chinese male, Han Chinese Female, Uyghur female, and 
Uyghur male. 
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particular the case when the surface data is taken under 
very different poses or light conditions, or image data con- 
tains non-Gaussian noises like spikes, holes and nevus. 
Under such conditions, PCA gives no guarantee that the 
modes of maximum variance (the principal component 
axis) are accounted for by the features of interest rather 
than noises. However, such problems are minimized in our 
study. First of all, for the purpose of anthropometric ana- 
lyses, all surface images are supposed to be taken in labora- 
tory environment where pose and light condition are well 
controlled to suppress unwanted variations. Moreover, sev- 
eral pre-processing and classification method were adopted 
to increase the specificity: 1. Median filter is used to reduce 
non-Gaussian noises; 2, both local patch vectors are nor- 
malized to have zero mean and unit length to eliminate 
bias to either modality; 3, the reconstruction error e was 
used along with the Mahalanobis distance for pattern 
classification. After all, the landmark recognition errors 
presented in Table 1 were assessed across all the individ- 
uals in the test panels. Unless obvious imaging errors were 
observed and the corresponding images removed, the er- 
rant landmark localizations were not specifically filtered 
from the test panels. The low average recognition errors 
therefore support the robustness of our method. The land- 
mark localization may be further improved in the perform- 
ance. For example, in the PCA based localization, one may 
apply bigger patch sizes but use a subset of points within 
each patch to construct the signature vector P. This may 
reduce the redundant information and result in better ac- 
curacy or efficiency. On the other hand, the gain in accur- 
acy may also be partially attributed to the higher image 
resolution of our data (-30,000 vertices per surface on 
average) compared to the previous work (-10,000 vertices 
per surface). Furthermore, we also proposed a new proto- 
col for the TPS -based registration, whereby the TPS 
warping was only applied to the reference face while the 
sample faces remained undeformed and thus avoided the 
step of inverse TPS warping, thereby further increasing the 
efficiency of our method. It is interesting to note that both 
the automatic landmark annotation and the TPS based 
registration steps work equally well for two different eth- 
nicities, namely Han Chinese and Uyghur, in spite of the 
fact that they are substantially different in both genetic 
background and facial appearance. Han Chinese are repre- 
sentative of East Asian populations while Uyghur is an 
ancient admixture population whose ancestries came from 
both East Asians and Caucasians (European people) [64]. 
As a result, Uyghur participants exhibited many Caucasian 
facial features such as sunken eyes and high nose ridge, 
etc. (Figure 6). This method was also tested on other 
ethnic groups and showed consistent robustness (data 
not shown). Such ethnicity independency is very im- 
portant when this method is used to study the cross 
population facial morphological variations in humans. 



It should be noted that the aim of this study is not to 
propose a general scheme of 3D surface registration. Ra- 
ther, our method combines various pragmatic solutions 
to construct an optimized pipeline for high-throughput 
registration of dense 3D facial images. To the authors' 
knowledge, this is the only fully automatic non-rigid 
registration method that aligns dense 3D face images at 
a landmark accuracy of - 2 mm. In fact, high resolution 
non-rigid registration methods that maximize the anatom- 
ical correspondence can greatly promote the biological 
and medical researches of the corresponding organs/fea- 
tures. A formal TPS based 3D face registration method, 
DSM, has revealed great potential in series of medical gen- 
etic studies of face dysmorphology [8-13]. In the field of 
brain research, efficient non-rigid methods, such as HAM- 
MER [6], TPS-HAMMER [65] and Brainaligner [5] were 
developed for specific image data, and successfully applied 
in detection of aging/disease induced brain morphological 
changes [66] and delineation of ultrastructures [67] and 
neuronal circuits [68] of brain. Given the high definition 
and accuracy, our method may have many potential appli- 
cations, such as quantitative characterization of human 
face diversity and detection of genetic /environmental fac- 
tors that can induce facial shape changes. Furthermore, as 
this method is robust to diverse ethnicities, it is particu- 
larly suitable for studying evolution and divergence of 
human face among different populations. A standalone 
software has been implemented for this method, and is 
freely available for academic use upon request. 

In the future, the anatomic correspondence can be 
further improved by including additional features such 
as the eyebrows, eyelid lines, and lip lines as landmarks. 
These features may provide discrimination power to- 
wards different facial expressions. 

Conclusions 

In summary, this study proposes a new scheme to build ac- 
curate and robust anatomical correspondence across dense 
surfaces of 3D facial images; and it was implemented into a 
fully automatic and efficient registration package. This 
method enables high-throughput capture and analysis of 
the wide ranging and yet fine detailed variations within hu- 
man facial morphology. Such comprehensive and high 
resolution phenotypic data should be valuable in anthropo- 
logical, disease diagnosis, and forensic studies of human 
facial morphology. 
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Additional file 1: Appendix. 

Additional file 2: Figure SI. Mean r value of different human groups 
with respect to different R values. For each group, 50 faces were 
analyzed. For each face, the nose tip was first annotated and all 
neighboring points within distance R were used to calculate the r value. 
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